33 research outputs found
Recommended from our members
Enabling the Reuse of Electronic Health Record Data through Data Quality Assessment and Transparency
With the increasing adoption of health information technology and the growth in the resulting electronic repositories of clinical data, the secondary use of electronic health record data has become one of the most promising approaches to enabling and speeding clinical research. Unfortunately, electronic health record data are known to suffer from significant data quality problems. Awareness of the problem of electronic health record data quality is growing, but methods for measuring data quality remain ad hoc. Clinical researchers must handle this complicated problem without systematic or validated methods. The lack of appropriate or trustworthy electronic health record data quality assessment methodology limits the validity of research performed with electronic health record data.
This dissertation documents the development of a data quality assessment framework and guideline for clinical researchers engaged in the secondary use of electronic health record data for retrospective research. Through a systematic literature review and interviews with key stakeholders, we identified core constructs of data quality, as well as priorities for future approaches to electronic health record data quality assessment. We used a data-driven approach to demonstrate that data quality is task-dependent, indicating that appropriate data quality measures must be selected, applied, and interpreted within the context of a specific study. On the basis of these results, we developed and evaluated a dynamic guideline for data quality measures in order to help researchers choose data quality measures and methods appropriately within the context of reusing electronic health record data for research
Recommended from our members
Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research
Background: To demonstrate that subject selection based on sufficient laboratory results and medication orders in electronic health records can be biased towards sick patients. Methods: Using electronic health record data from 10,000 patients who received anesthetic services at a major metropolitan tertiary care academic medical center, an affiliated hospital for women and children, and an affiliated urban primary care hospital, the correlation between patient health status and counts of days with laboratory results or medication orders, as indicated by the American Society of Anesthesiologists Physical Status Classification (ASA Class), was assessed with a Negative Binomial Regression model. Results: Higher ASA Class was associated with more points of data: compared to ASA Class 1 patients, ASA Class 4 patients had 5.05 times the number of days with laboratory results and 6.85 times the number of days with medication orders, controlling for age, sex, emergency status, admission type, primary diagnosis, and procedure. Conclusions: Imposing data sufficiency requirements for subject selection allows researchers to minimize missing data when reusing electronic health records for research, but introduces a bias towards the selection of sicker patients. We demonstrated the relationship between patient health and quantity of data, which may result in a systematic bias towards the selection of sicker patients for research studies and limit the external validity of research conducted using electronic health record data. Additionally, we discovered other variables (i.e., admission status, age, emergency classification, procedure, and diagnosis) that independently affect data sufficiency
Recommended from our members
A Real-Time Screening Alert Improves Patient Recruitment Efficiency
The scarcity of cost-effective patient identification methods represents a significant barrier to clinical research. Research recruitment alerts have been designed to facilitate physician referrals but limited support is available to clinical researchers. We conducted a retrospective data analysis to evaluate the efficacy of a real-time patient identification alert delivered to clinical research coordinators recruiting for a clinical prospective cohort study. Data from log analysis and informal interviews with coordinators were triangulated. Over a 12-month period, 11,295 were screened electronically, 1,449 were interviewed, and 282 were enrolled. The enrollment rates for the alert and two other conventional methods were 4.65%, 2.01%, and 1.34% respectively. A taxonomy of eligibility status was proposed to precisely categorize research patients. Practical ineligibility factors were identified and their correlation with age and gender were analyzed. We conclude that the automatic prescreening alert improves screening efficiency and is an effective aid to clinical research coordinators
Electronic health record data quality assessment and tools: A systematic review
OBJECTIVE: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies.
MATERIALS AND METHODS: We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process.
RESULTS: We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology.
DISCUSSION: There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality.
CONCLUSION: Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process
Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave
Background:The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations.
Objective:This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database.
Methods:At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as “Declined” were grouped with “Refused,” and “Multiple Race” was grouped with “Two or more races” and “Multiracial.”
Results:“No matching concept” was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category.
Conclusions:Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy
Issues with variability in electronic health record data about race and ethnicity: Descriptive analysis of the National COVID Cohort Collaborative Data Enclave
BACKGROUND: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations.
OBJECTIVE: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database.
METHODS: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as Declined were grouped with Refused, and Multiple Race was grouped with Two or more races and Multiracial.
RESULTS: No matching concept was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category.
CONCLUSIONS: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy
Identification of Conserved and HLA Promiscuous DENV3 T-Cell Epitopes
Anti-dengue T-cell responses have been implicated in both protection and immunopathology. However, most of the T-cell studies for dengue include few epitopes, with limited knowledge of their inter-serotype variation and the breadth of their human leukocyte antigen (HLA) affinity. In order to expand our knowledge of HLA-restricted dengue epitopes, we screened T-cell responses against 477 overlapping peptides derived from structural and non-structural proteins of the dengue virus serotype 3 (DENV3) by use of HLA class I and II transgenic mice (TgM): A2, A24, B7, DR2, DR3 and DR4. TgM were inoculated with peptides pools and the T-cell immunogenic peptides were identified by ELISPOT. Nine HLA class I and 97 HLA class II novel DENV3 epitopes were identified based on immunogenicity in TgM and their HLA affinity was further confirmed by binding assays analysis. A subset of these epitopes activated memory T-cells from DENV3 immune volunteers and was also capable of priming naïve T-cells, ex vivo, from dengue IgG negative individuals. Analysis of inter- and intra-serotype variation of such an epitope (A02-restricted) allowed us to identify altered peptide ligands not only in DENV3 but also in other DENV serotypes. These studies also characterized the HLA promiscuity of 23 HLA class II epitopes bearing highly conserved sequences, six of which could bind to more than 10 different HLA molecules representing a large percentage of the global population. These epitope data are invaluable to investigate the role of T-cells in dengue immunity/pathogenesis and vaccine design. © 2013 Nascimento et al